Nominal Logistic Regression

Statistics for Data Science II

Introduction

  • We have discussed binary logistic regression for binomial outcomes,

\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k

  • We have also discussed ordinal logistic regression for ordinal outcomes,

\ln \left( \frac{\pi_1 + ... + \pi_j }{\pi_{j+1} + ... + \pi_{c}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1} x_1 + ... + \hat{\beta}_{k} x_k

  • Today, we will discuss nominal logistic regression for multinomial outcomes,

\ln \left( \frac{\pi_j}{\pi_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k

Nominal (or Multinomial) Logistic Regression

  • Like in ordinal logistic regression, we will create c-1 models.

    • Unlike ordinal logistic regression, we no longer assume proportional odds.

    • This means that we now have different slopes for each model constructed.

\ln \left( \frac{\pi_j}{\pi_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k

  • We will now use the multinom() function from the nnet package.
library(nnet)
m <- multinom(outcome ~ var_1 + var_2 + ... + var_k, 
              data = dataset)

Example

Let’s consider data from a General Social Survey, relating political ideology to political party affiliation. Political ideology has a five-point ordinal scale, ranging from very liberal (Y=1) to very conservative (Y=5). Let x be an indicator variable for political party, with x = 1 for Democrats and x = 0 for Republicans. We will construct an ordinal logistic regression model that models political ideology as a function of political party and sex.

library(gsheet)
library(tidyverse)
gss <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1QgTiSaxVkZvLs9guX-pnAxln0hn4Xtsa1DiRGjcKXsM/edit?usp=sharing") %>%
  mutate(Ideology = as.factor(Ideology))
head(gss)

Example

  • Let’s model ideology as a function of political party affiliation and sex.
# weights:  20 (12 variable)
initial  value 1343.880657 
iter  10 value 1234.606706
final  value 1231.166030 
converged
Call:
multinom(formula = Ideology ~ Party + Sex, data = gss)

Coefficients:
                      (Intercept) PartyRepublican     SexMale
2 - Liberal            0.06688293       0.4241904 -0.13565500
3 - Moderate           0.89856922       0.8609859 -0.37175538
4 - Conservative      -0.73965106       1.6869369  0.16265398
5 - Very Conservative -0.40728933       1.5633886  0.07632974

Std. Errors:
                      (Intercept) PartyRepublican   SexMale
2 - Liberal             0.1902369       0.2833214 0.2645717
3 - Moderate            0.1623610       0.2426818 0.2272924
4 - Conservative        0.2255970       0.2871522 0.2694358
5 - Very Conservative   0.2067078       0.2727859 0.2572010

Residual Deviance: 2462.332 
AIC: 2486.332 

\begin{align*} \ln \left( \frac{\pi_{\text{Lib}}}{\pi_{\text{V. Lib}}} \right) &= 0.07 + 0.42 \text{ republican} - 0.14 \text{ male} \\ \ln \left( \frac{\pi_{\text{Mod}}}{\pi_{\text{V. Lib}}} \right) &= 0.90 + 0.86 \text{ republican} - 0.37 \text{ male} \\ \ln \left( \frac{\pi_{\text{Cons}}}{\pi_{\text{V. Lib}}} \right) &= -0.74 + 1.68 \text{ republican} + 0.16 \text{ male} \\ \ln \left( \frac{\pi_{\text{V. Cons}}}{\pi_{\text{V. Lib}}} \right) &= -0.41 + 1.56 \text{ republican} + 0.08 \text{ male} \end{align*}

library(nnet)
m1 <- multinom(Ideology ~ Party + Sex,
                 data = gss)
summary(m1)

Interpretations

  • For a one [predictor unit] increase in [predictor], the odds of [response category j], as compared to [the reference category], are multiplied by e^{\hat{\beta}_i}.

  • For a one [predictor unit] increase in [predictor], the odds of [response category j], as compared to [the reference category], are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].

  • As compared to [reference category of predictor], the odds of [response category j], as compared to [reference category of outcome], for [predictor category of interest] are multiplied by e^{\hat{\beta}_i}.

  • As compared to [reference category of predictor], the odds of [response category j], as compared to [reference category of outcome], for [predictor category of interest] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].

Example

  • Let’s look at interpreting our models.
round(exp(coefficients(m1)), 2)
                      (Intercept) PartyRepublican SexMale
2 - Liberal                  1.07            1.53    0.87
3 - Moderate                 2.46            2.37    0.69
4 - Conservative             0.48            5.40    1.18
5 - Very Conservative        0.67            4.77    1.08
  • Specific: As compared to someone who identifies as a democrat, someone who identifies as republican has a 377% increase in the odds of saying their political ideology is very conservative as compared to very liberal.

  • More general: As compared to identifying as having a very liberal political ideology, those that identify as republican have increased odds of reporting more conservative political ideologies.

Inference - Hypothesis Testing

  • As we saw, the results from summary() do not include hypothesis test results.
  • To determine global significance, we will use full/reduced ANOVA.
car::Anova(model, type = 3)
  • We will construct p-values “by hand” for the individual models.
m <- multinom(outcome ~ var_1 + var_2 + ... + var_k, data = dataset)
z <- summary(m1)$coefficients/summary(m1)$standard.errors # construct z
p <- (1 - pnorm(abs(z)))*2 # construct p-values
t(p) # transpose to columns

Example

  • Let’s explore significance for our models.
  • To determine global significance, we will use car::Anova().
car::Anova(m1, type = 3)
  • We construct p-values “by hand” for the individual models.
z <- summary(m1)$coefficients/summary(m1)$standard.errors # construct z
p <- (1 - pnorm(abs(z)))*2 # construct p-values
t(p) # transpose to columns
                2 - Liberal 3 - Moderate 4 - Conservative 5 - Very Conservative
(Intercept)       0.7251555 3.123147e-08     1.043091e-03          4.879684e-02
PartyRepublican   0.1343397 3.884666e-04     4.235763e-09          9.972635e-09
SexMale           0.6081371 1.019271e-01     5.460540e-01          7.666416e-01
  • Globally, only political party is a significant predictor of political ideology (p < 0.001).

  • This holds true when comparing moderate (p < 0.001), conservative (p < 0.001), and very conservative (p < 0.001) to very liberal.

Inference - Hypothesis Testing

  • What if we are interested in comparing against, say, moderate ideology?

    • We would restructure the data such that moderate would be the “first” category that R saw.
  • Global significance will not change.

  • Model level significance will change.

    • This is because we are now comparing the outcomes to moderates, instead of very liberal.

Inference - Confidence Intervals

  • Like before, we can construct confidence intervals using the confint() function.

  • We, of course, want the confidence intervals of the odds ratios.

round(exp(confint(m1)),2)
, , 2 - Liberal

                2.5 % 97.5 %
(Intercept)      0.74   1.55
PartyRepublican  0.88   2.66
SexMale          0.52   1.47

, , 3 - Moderate

                2.5 % 97.5 %
(Intercept)      1.79   3.38
PartyRepublican  1.47   3.81
SexMale          0.44   1.08

, , 4 - Conservative

                2.5 % 97.5 %
(Intercept)      0.31   0.74
PartyRepublican  3.08   9.49
SexMale          0.69   2.00

, , 5 - Very Conservative

                2.5 % 97.5 %
(Intercept)      0.44   1.00
PartyRepublican  2.80   8.15
SexMale          0.65   1.79

Wrap Up

  • We have now covered logistic regression for all types of categorical outcomes.

    • Two responses \to binary logistic regression.

    • More than two ordered* responses \to ordinal logistic regression.

      • *: if we do not meet the proportional odds assumption, ignore ordering.
    • More than two responses \to nominal logistic regression.

  • Note that we have learned the models with a logit link function.

    • We can also use probit and complementary log log (cloglog) link functions.

    • You can read a discussion about the differences on stack overflow